EDA
Individual EDA of Work Variations
Factor w/ 4 levels "[0,5.3]","(5.3,7.9]",..: 2 4 2 3 1 3 3 3 3 2 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 5.300 7.900 9.251 11.600 100.000 101


Factor w/ 4 levels "[0,23.7]","(23.7,31.7]",..: 3 1 2 2 4 2 1 4 2 2 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 23.70 31.70 33.23 41.80 100.00 105


Factor w/ 4 levels "[0,20.3]","(20.3,23.9]",..: 2 2 2 3 1 4 3 4 3 1 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 20.30 23.90 24.12 27.70 100.00 105


Factor w/ 4 levels "[0,14.1]","(14.1,18.3]",..: 2 4 4 3 2 2 4 1 2 1 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 14.10 18.30 19.65 24.00 100.00 105


Factor w/ 4 levels "[0,5.4]","(5.4,8.7]",..: 3 3 3 2 1 2 3 2 2 3 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 5.400 8.700 9.636 12.800 100.000 105


Factor w/ 4 levels "[0,7.7]","(7.7,12.3]",..: 3 4 3 3 3 3 3 1 3 4 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 7.70 12.30 13.36 17.80 100.00 105


Factor w/ 4 levels "[0,3.5]","(3.5,5.4]",..: 2 3 4 1 2 3 1 3 2 3 ...

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 3.500 5.400 6.109 7.900 100.000 105


Next the seven variables for work variations (professional, production, unemployment, office, service, construction, self-employed) were assessed for normality. The boxplots that exhibited a decrease in income, as more of the specific work variation was included in the census tract, were unemployment, service, construction, and production. That is to say, as more unemployed individuals were accounted for in a given census tract, the income per capita decreased. The only work variation that exhibited an increase in average income was professional work. The remaining variables of office and self-employed remained relatively stable across quartiles. Looking at the histograms of each of the variables it appeared that only the proportion of professionals was distributed normally. The remaining six work variations were all skewed to the right. For professionals, the Q-Q plots affirmed the normality as the plot did not have the error terms straying far from the line with very small right and left tails. The same cannot be said for the other variables as each had an oversized right tail and a relatively small left tail. Overall the proportion of professionals appeared normally distributed while the other work variations did not.
Individual EDA of ethnicities
Factor w/ 4 levels "[0,0.8]","(0.8,4]",..: 3 4 4 2 4 3 4 3 3 3 ...

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.80 4.00 13.78 15.32 100.00


Factor w/ 4 levels "[0,2.4]","(2.4,7.2]",..: 1 1 1 3 1 3 2 1 1 1 ...

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 2.40 7.20 17.36 21.50 100.00


Factor w/ 4 levels "[0,0.1]","(0.1,1.2]",..: 2 3 3 1 3 1 1 1 1 2 ...

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.100 1.200 4.347 4.400 91.300


Factor w/ 4 levels "[0,37.1]","(37.1,70.3]",..: 3 2 3 3 2 3 3 3 4 3 ...

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 37.10 70.30 61.24 88.40 100.00


Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.7567 0.4000 100.0000


Finally the five ethnic variables (Native, White, Black, Hispanic, and Asian) were investigated. The boxplots for White showed an increase in average income between the first second and third quartiles but no change in the fourth. The boxplot for Asian showed an increase from the first through the fourth quartile. The boxplots for Hispanic slightly increased between the first and second quartile but did not change for the third quartile. The fourth quantile for Hispanic decreased significantly. The boxplot for Black increased in average income between the first and second quartile. Then there was a decrease in average income from the second to the fourth quartiles. Overall, it appeared that average income did change based on concentration of ethnicities in a census tract. The histogram for White was bimodal with the highest frequency at over 8,000. The histograms for the other four ethnicities were skewed to the right. Based on the histogram, it appeared that white had the highest responses followed by Hispanic, Black, Asian, and Native. All of the error terms along the Q-Q plot line for each of the ethnicity variables followed a curve with large left and right tails. Also, there were not enough responses from the Native ethnicity to construct a meaningful boxplot. For the native Q-Q plot, there was a clear pattern of the error terms along the line implying non-normality. Therefore, based on the assessment of the boxplots, histograms, and Q-Q plots, none of the ethnicities appear normally distributed.
'data.frame': 69672 obs. of 11 variables:
$ Hispanic : num 0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
$ White : num 87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
$ Black : num 7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
$ Asian : num 0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
$ Professional: num 34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
$ Service : num 17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
$ Office : num 21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
$ Construction: num 11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
$ Production : num 15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
$ Unemployment: num 5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
$ IncomePerCap: int 25713 18021 20689 24125 27526 30480 20442 32813 24028 24710 ...
[1] 626
[1] 0
[1] 11
'data.frame': 69567 obs. of 11 variables:
$ Hispanic : num 0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
$ White : num 87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
$ Black : num 7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
$ Asian : num 0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
$ Professional: num 34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
$ Service : num 17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
$ Office : num 21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
$ Construction: num 11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
$ Production : num 15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
$ Unemployment: num 5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
$ IncomePerCap: int 25713 18021 20689 24125 27526 30480 20442 32813 24028 24710 ...
- attr(*, "na.action")= 'omit' Named int 1484 1807 2299 2499 2789 4259 4444 4448 4449 4477 ...
..- attr(*, "names")= chr "1514" "1851" "2370" "2574" ...
PCA

Hispanic White Black Asian Professional Service Office
Hispanic 1.000 -0.657 -0.128 0.044 -0.331 0.272 -0.006
White -0.657 1.000 -0.579 -0.260 0.346 -0.464 -0.019
Black -0.128 -0.579 1.000 -0.100 -0.229 0.364 0.043
Asian 0.044 -0.260 -0.100 1.000 0.239 -0.043 -0.009
Professional -0.331 0.346 -0.229 0.239 1.000 -0.609 -0.131
Service 0.272 -0.464 0.364 -0.043 -0.609 1.000 -0.145
Office -0.006 -0.019 0.043 -0.009 -0.131 -0.145 1.000
Construction Production Unemployment
Hispanic 0.254 0.109 0.212
White -0.015 -0.100 -0.484
Black -0.161 0.116 0.471
Asian -0.221 -0.201 -0.091
Professional -0.496 -0.651 -0.434
Service 0.028 0.117 0.448
Office -0.238 -0.201 0.032
[ reached getOption("max.print") -- omitted 3 rows ]
Hispanic White Black Asian Professional
Hispanic 545.4011283 -476.691379 -66.397180 8.9465601 -104.28397
White -476.6913785 965.201153 -398.350924 -69.8086312 144.90996
Black -66.3971800 -398.350924 490.216058 -19.2051491 -68.18805
Asian 8.9465601 -69.808631 -19.205149 74.8157824 27.79961
Professional -104.2839673 144.909962 -68.188051 27.7996077 181.46453
Service 50.8005134 -115.378194 64.459823 -2.9669891 -65.72219
Office -0.7871622 -3.437411 5.610685 -0.4386444 -10.31925
Service Office Construction Production Unemployment
Hispanic 50.800513 -0.7871622 35.187809 19.085671 29.376338
White -115.378194 -3.4374111 -2.840299 -23.253782 -89.160921
Black 64.459823 5.6106854 -21.167123 19.280331 61.880556
Asian -2.966989 -0.4386444 -11.369500 -13.022798 -4.666184
Professional -65.722186 -10.3192466 -39.714390 -65.704931 -34.680751
Service 64.143427 -6.7633651 1.343278 6.999397 21.278983
Office -6.763365 34.1060908 -8.242961 -8.779620 1.095836
[ reached getOption("max.print") -- omitted 3 rows ]
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 1.7878 1.3389 1.1653 1.0355 0.88819 0.82267 0.76933
Proportion of Variance 0.3196 0.1792 0.1358 0.1072 0.07889 0.06768 0.05919
Cumulative Proportion 0.3196 0.4989 0.6347 0.7419 0.82078 0.88845 0.94764
PC8 PC9 PC10
Standard deviation 0.71304 0.12303 0.003342
Proportion of Variance 0.05084 0.00151 0.000000
Cumulative Proportion 0.99849 1.00000 1.000000
PC1 PC2 PC3 PC4 PC5
Hispanic 0.29824667 -0.01361924 0.61175592 -0.22874996 -0.202137004
White -0.41731343 0.37316978 -0.26669236 -0.02170213 0.001281164
Black 0.29691738 -0.34563121 -0.45079386 0.18397271 -0.035677561
Asian -0.07602835 -0.38415017 0.43937912 0.18524481 0.610498550
Professional -0.46156573 -0.28644972 0.07425599 0.23265934 -0.203398965
Service 0.40338607 -0.11379011 -0.04473355 0.16595822 -0.238779921
Office -0.02918079 -0.21748570 -0.16640164 -0.88655225 0.152863255
PC6 PC7 PC8 PC9 PC10
Hispanic 0.18631155 -0.393126083 0.003871628 0.504139978 -2.330305e-05
White -0.30458284 0.004065075 0.229528223 0.685219633 -1.885794e-05
Black 0.34118441 0.222931142 -0.394503125 0.481982753 -1.032280e-05
Asian -0.21635417 0.383917067 0.093476318 0.208880218 -1.178804e-05
Professional 0.29065544 -0.118425431 0.128229850 -0.001487139 6.992409e-01
Service -0.73848184 -0.074734416 -0.118114735 0.007694364 4.157180e-01
Office -0.05646975 0.132714775 -0.045973528 -0.002938736 3.031380e-01
[ reached getOption("max.print") -- omitted 3 rows ]


PC1 PC2 PC3 PC4
Min. :-5.4230 Min. :-8.10196 Min. :-4.8516 Min. :-13.66580
1st Qu.:-1.2752 1st Qu.:-0.88018 1st Qu.:-0.6341 1st Qu.: -0.62140
Median :-0.3129 Median : 0.02552 Median :-0.1991 Median : 0.02273
Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
3rd Qu.: 1.0789 3rd Qu.: 0.95765 3rd Qu.: 0.5110 3rd Qu.: 0.64724
Max. :10.4288 Max. : 9.79218 Max. : 6.3807 Max. : 5.32333
PC5 PC6 PC7 PC8
Min. :-4.4289 Min. :-8.51468 Min. :-5.8693 Min. :-3.99610
1st Qu.:-0.5584 1st Qu.:-0.46560 1st Qu.:-0.4500 1st Qu.:-0.39751
Median :-0.0921 Median : 0.02195 Median :-0.0353 Median :-0.02785
Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
3rd Qu.: 0.4421 3rd Qu.: 0.49022 3rd Qu.: 0.4223 3rd Qu.: 0.36326
Max. : 8.3336 Max. : 6.46528 Max. :10.8685 Max. :11.38794
PC9 PC10
Min. :-2.08722 Min. :-1.046e-02
1st Qu.:-0.01512 1st Qu.:-3.385e-05
Median : 0.02115 Median :-8.511e-06
Mean : 0.00000 Mean : 0.000e+00
3rd Qu.: 0.04651 3rd Qu.: 2.566e-05
Max. : 0.29456 Max. : 1.048e-02
PC1 PC2 PC3 PC4 PC5
PC1 1.000000e+00 -3.229650e-15 -2.721219e-16 1.871797e-15 4.359679e-15
PC2 -3.229650e-15 1.000000e+00 3.697138e-15 -2.377401e-15 -3.444515e-15
PC3 -2.721219e-16 3.697138e-15 1.000000e+00 -3.061978e-15 -2.156517e-17
PC4 1.871797e-15 -2.377401e-15 -3.061978e-15 1.000000e+00 3.035937e-15
PC5 4.359679e-15 -3.444515e-15 -2.156517e-17 3.035937e-15 1.000000e+00
PC6 -1.446371e-15 4.523205e-15 2.506401e-15 -4.865942e-15 -6.599124e-15
PC7 1.693192e-15 -5.820379e-15 -2.483534e-15 8.687901e-17 -2.721381e-15
PC6 PC7 PC8 PC9 PC10
PC1 -1.446371e-15 1.693192e-15 -3.551669e-15 4.151794e-15 7.572904e-13
PC2 4.523205e-15 -5.820379e-15 -2.470873e-15 -6.254922e-14 1.718849e-13
PC3 2.506401e-15 -2.483534e-15 2.606976e-15 -1.085432e-13 4.459767e-13
PC4 -4.865942e-15 8.687901e-17 2.406897e-15 5.094873e-14 -1.366731e-12
PC5 -6.599124e-15 -2.721381e-15 5.057319e-15 3.157097e-14 1.250314e-12
PC6 1.000000e+00 -2.558483e-16 -9.819399e-16 9.526392e-15 -1.459345e-12
PC7 -2.558483e-16 1.000000e+00 2.078380e-15 6.076442e-14 3.275222e-13
[ reached getOption("max.print") -- omitted 3 rows ]
PC1 PC2 PC3 PC4 PC5
PC1 3.196104e+00 -7.730326e-15 -5.669127e-16 3.465187e-15 6.922661e-15
PC2 -7.730326e-15 1.792519e+00 5.768193e-15 -3.296035e-15 -4.096077e-15
PC3 -5.669127e-16 5.768193e-15 1.357952e+00 -3.694893e-15 -2.232046e-17
PC4 3.465187e-15 -3.296035e-15 -3.694893e-15 1.072297e+00 2.792276e-15
PC5 6.922661e-15 -4.096077e-15 -2.232046e-17 2.792276e-15 7.888895e-01
PC6 -2.127232e-15 4.981990e-15 2.402799e-15 -4.145235e-15 -4.821910e-15
PC7 2.328799e-15 -5.995129e-15 -2.226526e-15 6.921300e-17 -1.859571e-15
PC6 PC7 PC8 PC9 PC10
PC1 -2.127232e-15 2.328799e-15 -4.527510e-15 9.131644e-16 4.524235e-15
PC2 4.981990e-15 -5.995129e-15 -2.358842e-15 -1.030283e-14 7.690273e-16
PC3 2.402799e-15 -2.226526e-15 2.166185e-15 -1.556136e-14 1.736707e-15
PC4 -4.145235e-15 6.921300e-17 1.777180e-15 6.490730e-15 -4.729472e-15
PC5 -4.821910e-15 -1.859571e-15 3.202911e-15 3.449838e-15 3.711072e-15
PC6 6.767830e-01 -1.619282e-16 -5.760047e-16 9.641750e-16 -4.011944e-15
PC7 -1.619282e-16 5.918759e-01 1.140136e-15 5.751318e-15 8.420314e-16
[ reached getOption("max.print") -- omitted 3 rows ]
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 1.7878 1.3389 1.1653 1.0355 0.88819 0.82267 0.76933
Proportion of Variance 0.3196 0.1792 0.1358 0.1072 0.07889 0.06768 0.05919
Cumulative Proportion 0.3196 0.4989 0.6347 0.7419 0.82078 0.88845 0.94764
PC8 PC9 PC10
Standard deviation 0.71304 0.12303 0.003342
Proportion of Variance 0.05084 0.00151 0.000000
Cumulative Proportion 0.99849 1.00000 1.000000


Call:
lm(formula = IncomePerCap ~ ., data = pcadata_pcr_rot)
Residuals:
Min 1Q Median 3Q Max
-57889 -3154 -136 3093 39355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26167.82 20.93 1250.463 < 2e-16 ***
PC1 -4585.05 11.71 -391.701 < 2e-16 ***
PC2 -1454.29 15.63 -93.043 < 2e-16 ***
PC3 604.54 17.96 33.664 < 2e-16 ***
PC4 994.55 20.21 49.214 < 2e-16 ***
PC5 -878.20 23.56 -37.274 < 2e-16 ***
PC6 1377.18 25.44 54.140 < 2e-16 ***
PC7 -205.74 27.20 -7.564 3.96e-14 ***
PC8 -196.99 29.35 -6.712 1.93e-11 ***
PC9 3301.06 170.10 19.407 < 2e-16 ***
PC10 -3519.28 6262.21 -0.562 0.574
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5519 on 69556 degrees of freedom
Multiple R-squared: 0.7102, Adjusted R-squared: 0.7101
F-statistic: 1.704e+04 on 10 and 69556 DF, p-value: < 2.2e-16
Data: X dimension: 69567 10
Y dimension: 69567 1
Fit method: svdpc
Number of components considered: 10
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
CV 10252 6157 5841 5799 5707 5653 5539
adjCV 10252 6157 5841 5799 5707 5653 5538
7 comps 8 comps 9 comps 10 comps
CV 5536 5535 5520 5520
adjCV 5536 5535 5520 5520
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
X 31.96 49.89 63.47 74.19 82.08 88.85 94.76
IncomePerCap 63.93 67.54 68.01 69.02 69.60 70.82 70.84
8 comps 9 comps 10 comps
X 99.85 100.00 100.00
IncomePerCap 70.86 71.02 71.02


K- Means
List of 9
$ cluster : Named int [1:69567] 2 2 2 2 2 1 2 1 2 2 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:2, 1:11] -0.329 0.174 0.42 -0.222 -0.32 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "1" "2"
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:2] 1.15e+12 1.34e+12
$ tot.withinss: num 2.5e+12
$ betweenss : num 4.81e+12
$ size : int [1:2] 24086 45481
$ iter : int 1
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 2 clusters of sizes 24086, 45481
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.3285506 0.4200874 -0.3199683 0.2372900 0.9282858 -0.6150829
2 0.1739950 -0.2224715 0.1694500 -0.1256649 -0.4916051 0.3257379
Office Construction Production Unemployment IncomePerCap
1 -0.01890396 -0.4302842 -0.6556146 -0.5268802 37598.33
2 0.01001123 0.2278715 0.3472029 0.2790272 20114.40
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 1 2 2 1 1 2 2 1 2 2 2
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
2 2 1 1 1 1 1 2 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 1.153649e+12 1.344211e+12
(between_SS / total_SS = 65.8 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 2 1 1 2 2 2 1 2 2 2 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:3, 1:11] 0.432 -0.24 -0.363 -0.575 0.34 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "1" "2" "3"
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:3] 4.33e+11 3.94e+11 3.93e+11
$ tot.withinss: num 1.22e+12
$ betweenss : num 6.09e+12
$ size : int [1:3] 27175 29638 12754
$ iter : int 2
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 3 clusters of sizes 27175, 29638, 12754
Cluster means:
Hispanic White Black Asian Professional Service
1 0.4318377 -0.5751979 0.3952287 -0.15378102 -0.7689991 0.6127621
2 -0.2397814 0.3399978 -0.2076592 -0.02410908 0.1329131 -0.2111302
3 -0.3629095 0.4354829 -0.3595529 0.38368702 1.3296436 -0.8149861
Office Construction Production Unemployment IncomePerCap
1 -0.02605785 0.2974405 0.51204618 0.6194822 16576.79
2 0.07327428 0.0108065 -0.07894429 -0.3101802 27821.96
3 -0.11475466 -0.6588701 -0.90756657 -0.5991303 42759.54
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
2 1 1 2 2 2 1 2 2 2 2 1 1 1 2 2 2 1 3 3 2 2 2 1 2 2
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
1 1 2 2 3 2 3 1 3 3 2 2 2 2 1 1 2 1 1 1 1 1 1 1 1 1
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 433025278355 393691127581 392888818743
(between_SS / total_SS = 83.3 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 4 2 4 4 4 1 4 1 4 4 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:4, 1:11] -0.302 0.668 -0.374 -0.132 0.407 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:4] "1" "2" "3" "4"
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:4] 1.76e+11 1.92e+11 1.73e+11 1.75e+11
$ tot.withinss: num 7.15e+11
$ betweenss : num 6.6e+12
$ size : int [1:4] 17574 17698 8266 26029
$ iter : int 2
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 4 clusters of sizes 17574, 17698, 8266, 26029
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.3016974 0.4066104 -0.28173021 0.1074823 0.5711741 -0.43512866
2 0.6680011 -0.8809881 0.57590840 -0.1651807 -0.9293827 0.83254425
3 -0.3738848 0.4389987 -0.37976189 0.4559152 1.5331982 -0.92442949
4 -0.1317654 0.1850702 -0.08076331 -0.1050413 -0.2406168 0.02128076
Office Construction Production Unemployment IncomePerCap
1 0.06629532 -0.2290380 -0.4319137 -0.46098899 32840.09
2 -0.05484069 0.3137655 0.5749681 0.89883620 14433.73
3 -0.17952039 -0.7693828 -1.0184110 -0.62970642 45780.77
4 0.04953752 0.1856318 0.2240905 -0.09992813 23412.84
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
4 2 4 4 4 1 4 1 4 4 4 4 4 2 4 1 4 2 1 1 4 4 1 2 4 4
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
2 4 1 1 3 1 1 4 1 3 4 1 1 4 4 4 4 4 2 2 2 2 2 4 4 2
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
2 4 4 2 4 4 4 2 2 2 4 4 4 2 2 4 4 4 2 4 2 4 4
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 175536566241 191571930523 172553341760 175374034150
(between_SS / total_SS = 90.2 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 5 1 1 1 5 5 1 4 1 5 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:5, 1:11] -0.00846 0.85002 -0.3769 -0.33227 -0.24884 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:5] 9.11e+10 1.06e+11 9.67e+10 9.02e+10 8.75e+10
$ tot.withinss: num 4.72e+11
$ betweenss : num 6.84e+12
$ size : int [1:5] 20760 12813 6192 11579 18223
$ iter : int 2
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 5 clusters of sizes 20760, 12813, 6192, 11579, 18223
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.008456987 -0.01428733 0.06737527 -0.12483068 -0.4552929 0.2049365
2 0.850017949 -1.08804866 0.68083922 -0.17734851 -1.0217088 0.9792310
3 -0.376900236 0.44391016 -0.39300833 0.48161313 1.6342530 -0.9824220
4 -0.332267736 0.42488521 -0.31698657 0.22115974 0.8637778 -0.5739916
5 -0.248840396 0.36049690 -0.22051300 -0.03726641 0.1329121 -0.2234518
Office Construction Production Unemployment IncomePerCap
1 0.02032925 0.25592838 0.38069044 0.09861782 20788.54
2 -0.07442558 0.32126962 0.59350367 1.09913874 13091.47
3 -0.21879800 -0.82200303 -1.06575585 -0.64529647 47520.79
4 0.02701923 -0.40030391 -0.64316728 -0.52575884 36236.88
5 0.08634809 0.01621363 -0.08018998 -0.33184070 27836.80
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
5 1 1 1 5 5 1 4 1 5 1 1 1 1 5 5 1 2 4 4 5 5 5 1 1 1
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
1 1 5 5 4 4 4 1 4 3 1 5 5 1 1 1 5 1 2 1 2 1 2 1 1 1
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
1 1 1 2 1 1 1 1 1 1 1 5 1 2 2 1 1 5 1 1 2 1 1
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 91107426890 106355408998 96685021931 90198216197 87531045107
(between_SS / total_SS = 93.5 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
List of 9
$ cluster : Named int [1:69567] 5 1 1 1 5 5 1 4 1 5 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:5, 1:11] -0.00846 0.85002 -0.3769 -0.33227 -0.24884 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:5] 9.11e+10 1.06e+11 9.67e+10 9.02e+10 8.75e+10
$ tot.withinss: num 4.72e+11
$ betweenss : num 6.84e+12
$ size : int [1:5] 20760 12813 6192 11579 18223
$ iter : int 2
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 5 clusters of sizes 20760, 12813, 6192, 11579, 18223
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.008456987 -0.01428733 0.06737527 -0.12483068 -0.4552929 0.2049365
2 0.850017949 -1.08804866 0.68083922 -0.17734851 -1.0217088 0.9792310
3 -0.376900236 0.44391016 -0.39300833 0.48161313 1.6342530 -0.9824220
4 -0.332267736 0.42488521 -0.31698657 0.22115974 0.8637778 -0.5739916
5 -0.248840396 0.36049690 -0.22051300 -0.03726641 0.1329121 -0.2234518
Office Construction Production Unemployment IncomePerCap
1 0.02032925 0.25592838 0.38069044 0.09861782 20788.54
2 -0.07442558 0.32126962 0.59350367 1.09913874 13091.47
3 -0.21879800 -0.82200303 -1.06575585 -0.64529647 47520.79
4 0.02701923 -0.40030391 -0.64316728 -0.52575884 36236.88
5 0.08634809 0.01621363 -0.08018998 -0.33184070 27836.80
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
5 1 1 1 5 5 1 4 1 5 1 1 1 1 5 5 1 2 4 4 5 5 5 1 1 1
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
1 1 5 5 4 4 4 1 4 3 1 5 5 1 1 1 5 1 2 1 2 1 2 1 1 1
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
1 1 1 2 1 1 1 1 1 1 1 5 1 2 2 1 1 5 1 1 2 1 1
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 91107426890 106355408998 96685021931 90198216197 87531045107
(between_SS / total_SS = 93.5 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 6 4 4 6 6 2 4 2 6 6 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:6, 1:11] -0.349 -0.285 0.974 0.132 -0.385 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:6] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:6] 5.33e+10 5.39e+10 6.79e+10 5.20e+10 5.61e+10 ...
$ tot.withinss: num 3.36e+11
$ betweenss : num 6.98e+12
$ size : int [1:6] 8294 13096 9901 16290 4763 17223
$ iter : int 3
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 6 clusters of sizes 8294, 13096, 9901, 16290, 4763, 17223
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.3494717 0.4296992 -0.3360808 0.30654695 1.0897348 -0.68272359
2 -0.2854601 0.3976771 -0.2665621 0.05425283 0.4263884 -0.36762711
3 0.9739417 -1.2113147 0.7268560 -0.18905224 -1.0768116 1.07158297
4 0.1316581 -0.2288132 0.2195608 -0.13438783 -0.6075451 0.36540166
5 -0.3852568 0.4492850 -0.4005145 0.50197893 1.7117093 -1.02643109
6 -0.1925231 0.2792048 -0.1502203 -0.09190832 -0.1287054 -0.06945889
Office Construction Production Unemployment IncomePerCap
1 -0.029446162 -0.5329557 -0.7839551 -0.5661097 38948.40
2 0.089116820 -0.1457015 -0.3276215 -0.4288697 31179.25
3 -0.088194644 0.3291525 0.5983303 1.2390235 12157.33
4 0.007625546 0.2812103 0.4727567 0.2799036 18933.00
5 -0.254726418 -0.8568239 -1.1023498 -0.6538796 48913.98
6 0.060350087 0.1491981 0.1403863 -0.1974674 24809.23
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
6 4 4 6 6 2 4 2 6 6 6 4 4 4 6 2 6 3 1 1 6 6 2 4 6 6
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
4 4 2 2 1 2 1 4 1 5 6 2 2 6 4 4 6 4 3 4 3 4 3 4 4 4
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
4 4 6 4 4 4 4 4 4 4 4 6 4 4 4 4 4 6 4 4 3 4 6
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 53289943534 53860735274 67861554081 51981016101 56119303347 52399097534
(between_SS / total_SS = 95.4 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 2 5 7 7 2 2 7 3 7 7 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:7, 1:11] -0.396 -0.258 -0.313 1.054 0.237 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:7] 3.01e+10 3.39e+10 3.39e+10 5.06e+10 3.68e+10 ...
$ tot.withinss: num 2.52e+11
$ betweenss : num 7.06e+12
$ size : int [1:7] 3586 13111 9445 8258 13680 6091 15396
$ iter : int 2
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 7 clusters of sizes 3586, 13111, 9445, 8258, 13680, 6091, 15396
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.3964506 0.4542494 -0.40764714 0.5282089 1.7925081 -1.0717282
2 -0.2576300 0.3733856 -0.23385201 -0.0269068 0.1755536 -0.2474022
3 -0.3125257 0.4200419 -0.30528640 0.1527982 0.6907586 -0.4893448
4 1.0539699 -1.2761352 0.73515083 -0.1940444 -1.1025060 1.1203953
5 0.2366030 -0.3958078 0.34172930 -0.1373108 -0.7010158 0.4859066
6 -0.3575666 0.4294729 -0.34914245 0.3685170 1.2778561 -0.7833195
Office Construction Production Unemployment IncomePerCap
1 -0.29189615 -0.895033912 -1.1399021 -0.65954880 50244.25
2 0.08656937 -0.003686199 -0.1156669 -0.35053069 28266.51
3 0.06459196 -0.301790762 -0.5299635 -0.49757889 34274.90
4 -0.08602009 0.329565942 0.5903558 1.33577584 11570.50
5 -0.02053170 0.292153979 0.5252348 0.41200162 17787.86
6 -0.07512151 -0.640358239 -0.8939002 -0.59387071 41486.24
[ reached getOption("max.print") -- omitted 1 row ]
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
2 5 7 7 2 2 7 3 7 7 7 7 5 5 7 2 7 5 3 3 2 2 2 5 7 7
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
5 5 2 2 6 3 3 7 3 6 7 2 3 7 5 5 2 5 4 5 5 5 4 5 7 5
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
5 5 7 5 5 7 7 5 5 5 5 2 5 5 5 7 5 2 5 7 4 5 7
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 30078367255 33918666154 33924690315 50620627295 36763050021 31979820579
[7] 34290620831
(between_SS / total_SS = 96.6 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 1 8 7 1 1 5 7 5 1 1 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:8, 1:11] -0.21 -0.398 -0.331 -0.359 -0.276 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:8] 2.38e+10 2.24e+10 2.47e+10 2.28e+10 2.44e+10 ...
$ tot.withinss: num 1.95e+11
$ betweenss : num 7.12e+12
$ size : int [1:8] 13055 3152 7742 5140 10623 6131 13192 10532
$ iter : int 2
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 8 clusters of sizes 13055, 3152, 7742, 5140, 10623, 6131, 13192, 10532
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.2099021 0.31065143 -0.17687237 -0.08368332 -0.08389187 -0.09881344
2 -0.3981141 0.45578358 -0.40960492 0.53311507 1.81785638 -1.08784127
3 -0.3314624 0.43062774 -0.31765082 0.20067007 0.83132143 -0.55740904
4 -0.3588452 0.42873072 -0.36110010 0.40713285 1.35698554 -0.82340123
5 -0.2764009 0.38868612 -0.25353852 0.02632578 0.34875186 -0.33062038
6 1.1604954 -1.34363273 0.72389507 -0.20529827 -1.11835632 1.17173776
Office Construction Production Unemployment IncomePerCap
1 0.06138421 0.1289094 0.1064936 -0.23047133 25340.47
2 -0.30249732 -0.9092239 -1.1486967 -0.66205029 50788.01
3 0.03930789 -0.3810941 -0.6273035 -0.52118182 35892.73
4 -0.10289042 -0.6828886 -0.9379571 -0.61013558 42677.39
5 0.09089521 -0.1005031 -0.2648395 -0.40948506 30223.86
6 -0.08558404 0.3344531 0.5598170 1.46755772 10698.00
[ reached getOption("max.print") -- omitted 2 rows ]
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 8 7 1 1 5 7 5 1 1 1 7 7 8 1 5 7 8 3 3 1 1 5 7 7 7
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
8 7 5 5 4 5 3 7 3 4 1 5 5 1 7 7 1 7 6 8 8 8 8 7 7 7
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
8 7 7 8 7 7 7 8 7 8 7 1 7 8 8 7 7 1 7 7 6 7 7
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 23806194051 22351110068 24674184543 22763698227 24425126210 32229516603
[7] 22734152115 22364091140
(between_SS / total_SS = 97.3 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 7 3 3 7 9 1 3 1 7 7 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:9, 1:11] -0.294 -0.347 0.049 -0.402 1.22 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:9] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:9] 1.54e+10 1.52e+10 1.73e+10 1.58e+10 2.42e+10 ...
$ tot.withinss: num 1.55e+11
$ betweenss : num 7.16e+12
$ size : int [1:9] 8025 5905 11753 2719 4994 4354 12230 9140 10447
$ iter : int 3
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
K-means clustering with 9 clusters of sizes 8025, 5905, 11753, 2719, 4994, 4354, 12230, 9140, 10447
Cluster means:
Hispanic White Black Asian Professional Service
1 -0.29398811 0.4084707 -0.2881384 0.09801206 0.5412472 -0.41887898
2 -0.34732990 0.4326622 -0.3266366 0.26353963 0.9956851 -0.63591035
3 0.04903299 -0.1030093 0.1312846 -0.13421926 -0.5427295 0.27794329
4 -0.40241766 0.4583190 -0.4144673 0.54771279 1.8491328 -1.10706894
5 1.21957721 -1.3726251 0.7042742 -0.21976591 -1.1201758 1.19133726
6 -0.35966388 0.4293714 -0.3692219 0.42709539 1.4300252 -0.86183629
Office Construction Production Unemployment IncomePerCap
1 0.092171984 -0.214882278 -0.42684953 -0.4601316 32552.93
2 -0.003486057 -0.478374937 -0.72844821 -0.5525047 37694.40
3 0.017290823 0.273752980 0.44804925 0.1820710 19710.62
4 -0.310870838 -0.926207144 -1.16439186 -0.6642217 51363.70
5 -0.083922955 0.335218762 0.54026333 1.5507180 10154.85
6 -0.136970585 -0.719826670 -0.97229327 -0.6191718 43867.61
[ reached getOption("max.print") -- omitted 3 rows ]
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
7 3 3 7 9 1 3 1 7 7 7 3 3 3 7 9 7 8 2 2 9 9 9 3 7 7
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
3 3 1 9 2 1 2 3 2 6 7 1 1 7 3 3 9 3 5 8 8 8 8 3 3 3
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
3 3 7 8 3 3 3 8 3 3 3 9 3 8 8 3 3 7 3 3 5 3 7
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 15416927509 15194151787 17298838728 15761413559 24236871148 16554689443
[7] 17160968891 17348442911 16348725566
(between_SS / total_SS = 97.9 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"

List of 9
$ cluster : Named int [1:69567] 9 2 8 9 5 5 8 10 9 9 ...
..- attr(*, "names")= chr [1:69567] "1" "2" "3" "4" ...
$ centers : num [1:10, 1:11] 1.293 0.172 -0.361 0.733 -0.269 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:10] "1" "2" "3" "4" ...
.. ..$ : chr [1:11] "Hispanic" "White" "Black" "Asian" ...
$ totss : num 7.31e+12
$ withinss : num [1:10] 1.67e+10 1.25e+10 1.39e+10 1.17e+10 1.19e+10 ...
$ tot.withinss: num 1.28e+11
$ betweenss : num 7.18e+12
$ size : int [1:10] 3740 9868 3958 7455 8799 2488 5279 10782 10229 6969
$ iter : int 2
$ ifault : int 4
- attr(*, "class")= chr "kmeans"
K-means clustering with 10 clusters of sizes 3740, 9868, 3958, 7455, 8799, 2488, 5279, 10782, 10229, 6969
Cluster means:
Hispanic White Black Asian Professional Service
1 1.29347073 -1.3926502 0.65110535 -0.227144565 -1.10439711 1.2206491
2 0.17193979 -0.2995527 0.27181346 -0.131665132 -0.65475827 0.4157717
3 -0.36067127 0.4329611 -0.37594612 0.433931771 1.47068572 -0.8852975
4 0.73251794 -1.0447516 0.74170741 -0.163014767 -1.02683185 0.9384102
5 -0.26944247 0.3834500 -0.24218980 -0.002215569 0.27358175 -0.2989392
6 -0.40143948 0.4574908 -0.41624907 0.552865396 1.86667764 -1.1156001
Office Construction Production Unemployment IncomePerCap
1 -0.076264666 0.32954885 0.47907560 1.65962676 9446.23
2 -0.005854058 0.28803780 0.50879858 0.34460804 18266.13
3 -0.150092261 -0.74228280 -0.99238368 -0.63110451 44529.44
4 -0.088022816 0.32539421 0.65367907 0.92950639 14162.91
5 0.094618993 -0.05725246 -0.20068842 -0.38834330 29357.81
6 -0.323599903 -0.93488726 -1.16996086 -0.66198939 51688.15
[ reached getOption("max.print") -- omitted 4 rows ]
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
9 2 8 9 5 5 8 10 9 9 8 8 2 2 9 5 8 4 10 7 9 9 5 2 8 8
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53
2 2 5 5 7 10 10 8 10 3 8 5 10 8 2 2 9 2 4 2 4 2 4 2 8 2
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
2 2 8 4 2 8 8 2 2 2 8 9 2 4 2 8 2 9 2 8 4 2 8
[ reached getOption("max.print") -- omitted 69492 entries ]
Within cluster sum of squares by cluster:
[1] 16668900913 12479477177 13910959117 11660948482 11895068060 12674139695
[7] 12952720882 11959674961 11678714620 12133241415
(between_SS / total_SS = 98.2 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"


KNN
Preprocessing KNN


Factor w/ 2 levels "[855,2.47e+04]",..: 2 1 1 1 2 2 1 2 1 1 ...
[1] "factor"
[1] "[855,2.47e+04]" "(2.47e+04,5.6e+04]"
Factor w/ 4 levels "[855,1.88e+04]",..: 3 1 2 2 3 3 2 4 2 2 ...
[1] "factor"
[1] "[855,1.88e+04]" "(1.88e+04,2.47e+04]" "(2.47e+04,3.23e+04]"
[4] "(3.23e+04,5.6e+04]"
'data.frame': 69567 obs. of 12 variables:
$ Hispanic : num 0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
$ White : num 87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
$ Black : num 7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
$ Asian : num 0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
$ Professional: num 34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
$ Service : num 17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
$ Office : num 21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
$ Construction: num 11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
$ Production : num 15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
$ Unemployment: num 5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
$ ipc2 : Factor w/ 2 levels "[855,2.47e+04]",..: 2 1 1 1 2 2 1 2 1 1 ...
$ ipc4 : Factor w/ 4 levels "[855,1.88e+04]",..: 3 1 2 2 3 3 2 4 2 2 ...
[1] 0
[1] 0
[1] 12
'data.frame': 69567 obs. of 12 variables:
$ Hispanic : num 0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
$ White : num 87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
$ Black : num 7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
$ Asian : num 0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
$ Professional: num 34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
$ Service : num 17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
$ Office : num 21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
$ Construction: num 11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
$ Production : num 15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
$ Unemployment: num 5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
$ ipc2 : Factor w/ 2 levels "[855,2.47e+04]",..: 2 1 1 1 2 2 1 2 1 1 ...
$ ipc4 : Factor w/ 4 levels "[855,1.88e+04]",..: 3 1 2 2 3 3 2 4 2 2 ...
KNN Model
Train-Test split 3:1
KNN 2 categories
Selecting the correct “k”
How does “k” affect classification accuracy? Let’s create a function to calculate classification accuracy based on the number of “k.”
num [1:2, 1:15] 1 0.796 3 0.823 5 ...

Results
Factor w/ 2 levels "[855,2.47e+04]",..: 1 1 1 1 1 2 2 1 1 1 ...
- attr(*, "nn.index")= int [1:22836, 1:9] 31430 8744 21004 2152 14716 18952 43436 37471 18814 14542 ...
- attr(*, "nn.dist")= num [1:22836, 1:9] 0.569 0.47 0.541 0.497 0.401 ...
[1] 22836
dat_pred_ipc2
[855,2.47e+04] High
11177 11659
dat_ipc2.testLabels
dat_pred_ipc2 [855,2.47e+04] High
[855,2.47e+04] 9492 1685
High 1940 9719
[1] 22836
[1] 9492 9719
[1] 0.8412594
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
8.412594e-01 6.825270e-01 8.364545e-01 8.459773e-01 5.006131e-01
AccuracyPValue McnemarPValue
0.000000e+00 2.457036e-05
Sensitivity Specificity Pos Pred Value
0.8303009 0.8522448 0.8492440
Neg Pred Value Precision Recall
0.8336049 0.8492440 0.8303009
F1 Prevalence Detection Rate
0.8396656 0.5006131 0.4156595
Detection Prevalence Balanced Accuracy
0.4894465 0.8412729
KNN 4 categories
Selecting the correct “k”
How does “k” affect classification accuracy? Let’s create a function to calculate classification accuracy based on the number of “k.”
num [1:2, 1:15] 1 0.563 3 0.597 5 ...

Results
Factor w/ 4 levels "[855,1.88e+04]",..: 1 2 2 1 2 3 3 1 1 2 ...
[1] 22836
dat_pred_ipc4
[855,1.88e+04] Mid-Low (2.47e+04,3.23e+04] (3.23e+04,5.6e+04]
5284 5917 5721 5914
dat_ipc4.testLabels
dat_pred_ipc4 [855,1.88e+04] Mid-Low (2.47e+04,3.23e+04]
[855,1.88e+04] 4128 981 159
Mid-Low 1293 3015 1440
(2.47e+04,3.23e+04] 237 1483 2919
(3.23e+04,5.6e+04] 65 230 1215
dat_ipc4.testLabels
dat_pred_ipc4 (3.23e+04,5.6e+04]
[855,1.88e+04] 16
Mid-Low 169
(2.47e+04,3.23e+04] 1082
(3.23e+04,5.6e+04] 4404
[1] 22836
[1] 4128 3015 2919 4404
[1] 0.6334735
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
6.334735e-01 5.113148e-01 6.271853e-01 6.397277e-01 2.510510e-01
AccuracyPValue McnemarPValue
0.000000e+00 1.805656e-20
Sensitivity Specificity Pos Pred Value
Class: [855,1.88e+04] 0.7213000 0.9324490 0.7812263
Class: Mid-Low 0.5281135 0.8305599 0.5095488
Class: (2.47e+04,3.23e+04] 0.5091575 0.8361691 0.5102255
Class: (3.23e+04,5.6e+04] 0.7765826 0.9120303 0.7446737
Neg Pred Value Precision Recall F1
Class: [855,1.88e+04] 0.9091272 0.7812263 0.7213000 0.7500681
Class: Mid-Low 0.8407707 0.5095488 0.5281135 0.5186651
Class: (2.47e+04,3.23e+04] 0.8355828 0.5102255 0.5091575 0.5096909
Class: (3.23e+04,5.6e+04] 0.9251271 0.7446737 0.7765826 0.7602935
Prevalence Detection Rate Detection Prevalence
Class: [855,1.88e+04] 0.2506131 0.1807672 0.2313890
Class: Mid-Low 0.2500000 0.1320284 0.2591084
Class: (2.47e+04,3.23e+04] 0.2510510 0.1278245 0.2505255
Class: (3.23e+04,5.6e+04] 0.2483360 0.1928534 0.2589771
Balanced Accuracy
Class: [855,1.88e+04] 0.8268745
Class: Mid-Low 0.6793367
Class: (2.47e+04,3.23e+04] 0.6726633
Class: (3.23e+04,5.6e+04] 0.8443065
Lasso Regression
[1] 11 100

Ridge lambda value at 50th percentile:
[1] 11497.57
Ridge coefficients for lambda at 50th percentile:
(Intercept) Hispanic White Black Asian Professional
26167.8189 -689.1002 841.0879 -539.0475 479.8605 2185.3426
Service Office Construction Production Unemployment
-1484.7131 -283.9175 -885.1962 -1418.9224 -1231.2567
Ridge MSE for lambda at 50th percentile :
[1] 3616.177
Ridge lambda value at 60th percentile:
[1] 705.4802
Ridge coefficients for lambda value at 60th percentile:
(Intercept) Hispanic White Black Asian Professional
26167.8189 -424.2205 1148.1987 -175.4403 575.0376 3268.5062
Service Office Construction Production Unemployment
-2208.5207 -697.1881 -1146.3519 -2030.7716 -1606.5620
Ridge MSE for lambda at 60th percentile:
[1] 5091.732
(Intercept) Hispanic White Black Asian Professional
26167.8189 511.7676 2362.9118 735.6647 928.3809 3378.2481
Service Office Construction Production Unemployment
-2286.4616 -762.8110 -1152.9414 -2104.9176 -1614.7757
Train and Test sets

[1] 824.8974
lowest lamda from CV: 824.8974
MSE for best Ridge lamda: 30834392
All the coefficients :
(Intercept) Hispanic White Black Asian Professional
26167.8189 -463.2954 1103.9441 -215.2605 564.0687 3247.7449
Service Office Construction Production Unemployment
-2194.5052 -687.5156 -1144.0593 -2020.6029 -1601.7077
R^2:
[1] 0.7065955
Lasso


lowest lamda from CV: 16.19307
MSE for best Lasso lamda: 30709528
All the coefficients :
(Intercept) Hispanic White Black Asian Professional
26167.81892 13.56463 1690.00411 247.03605 712.23492 6030.24803
Service Office Construction Production Unemployment
-716.51986 367.51269 0.00000 -622.32649 -1612.95442
The non-zero coefficients :
(Intercept) Hispanic White Black Asian Professional
26167.81892 13.56463 1690.00411 247.03605 712.23492 6030.24803
Service Office Production Unemployment
-716.51986 367.51269 -622.32649 -1612.95442
[1] 0.7077836
lambda values are small so they do not deviate form the OLS much says 8 but has 9 most likely bc Hispanic has low coefficient. The effect of white and professional is much stronger than the other coefficients. e^5.5 =
Call:
lm(formula = IncomePerCap ~ ., data = datJLClean)
Residuals:
Min 1Q Median 3Q Max
-57889 -3154 -136 3093 39355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26167.82 20.93 1250.463 <2e-16 ***
Hispanic 973.15 87.56 11.114 <2e-16 ***
White 2983.27 117.35 25.422 <2e-16 ***
Black 1175.86 84.20 13.966 <2e-16 ***
Asian 1115.18 41.58 26.817 <2e-16 ***
Professional 921.45 4378.81 0.210 0.833
Service -3752.36 2603.40 -1.441 0.149
Office -1839.03 1898.41 -0.969 0.333
Construction -2235.99 1930.85 -1.158 0.247
Production -3487.94 2435.84 -1.432 0.152
Unemployment -1604.28 26.65 -60.195 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5519 on 69556 degrees of freedom
Multiple R-squared: 0.7102, Adjusted R-squared: 0.7101
F-statistic: 1.704e+04 on 10 and 69556 DF, p-value: < 2.2e-16
Call:
lm(formula = IncomePerCap ~ Hispanic + White + Black + Asian +
Professional + Service + Office + Production + Unemployment,
data = datJLClean)
Residuals:
Min 1Q Median 3Q Max
-57889 -3155 -139 3092 39315
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26167.82 20.93 1250.46 <2e-16 ***
Hispanic 972.98 87.56 11.11 <2e-16 ***
White 2983.19 117.35 25.42 <2e-16 ***
Black 1175.89 84.20 13.97 <2e-16 ***
Asian 1115.17 41.58 26.82 <2e-16 ***
Professional 5991.86 53.93 111.11 <2e-16 ***
Service -737.90 39.77 -18.55 <2e-16 ***
Office 359.14 28.63 12.54 <2e-16 ***
Production -667.56 40.62 -16.44 <2e-16 ***
Unemployment -1604.14 26.65 -60.19 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5519 on 69557 degrees of freedom
Multiple R-squared: 0.7102, Adjusted R-squared: 0.7101
F-statistic: 1.894e+04 on 9 and 69557 DF, p-value: < 2.2e-16
MSE for full model :
[1] 30459848
MSE for full model (w/o construction) :
[1] 30460435